-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-32776: Fix IBM Public Cloud DNS Provider Update Logic #1133
base: master
Are you sure you want to change the base?
OCPBUGS-32776: Fix IBM Public Cloud DNS Provider Update Logic #1133
Conversation
@gcs278: This pull request references Jira Issue OCPBUGS-32776, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
b387d07
to
b07d602
Compare
/jira refresh |
@gcs278: This pull request references Jira Issue OCPBUGS-32776, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@gcs278: This pull request references Jira Issue OCPBUGS-32776, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
I missed one detail of the bug, missing instructions in the |
a057f1a
to
0716651
Compare
@lihongan Added the missing scope change instructions for the PowerVS type. /unhold |
infra failures |
0716651
to
f51b1d3
Compare
Infra issues |
pre-merge tested on OpenStack and looks good now
will test on IBMCloud later |
pre-merge tested on IBMCloud and also looks good
|
/assign |
Progressing
Condition
f51b1d3
to
7a97b9a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you for the fix!
32caad5
to
2223383
Compare
infra failures |
Removed WIP, sorry for the delay, I am open to reviews again @candita @Miciah @SzucsAti Updates since last review:
|
Keeping the hold while I wait to see what the results of adding a IBM Cloud job are openshift/release#56785 /hold |
/retest |
Now that openshift/release#56785 has merged, though it will most likely fail due to DNS Propagation issues (being tested in #1132), I'm curious to see if we see anything interesting: |
I don't think it's enabled, but just curious: |
/testwith openshift/cluster-ingress-operator/master/e2e-ibmcloud-operator #1164 |
2223383
to
396acf9
Compare
After introducing and running e2e-ibmcloud-operator on different PRs, I realized it was passing successfully without this fix. I was mistaken that our E2E would have caught this bug. However, it looks like adding a Now let's test to make sure everything works: |
396acf9
to
d93c10b
Compare
Forgot to push... |
/testwith openshift/cluster-ingress-operator/master/e2e-ibmcloud-operator openshift/cluster-ingress-operator/pull#1164 |
@gcs278,
|
/testwith openshift/cluster-ingress-operator/master/e2e-ibmcloud-operator https://github.com/openshift/cluster-ingress-operator#1164 |
multi-pr-openshift-cluster-ingress-operator-1133-openshift-cluster-ingress-operator-1164 passed which has E2E test coverage for this bug, proving it resolved the problem. |
/retest-required |
/retest |
case iov1.CNAMERecordType: | ||
inputRData, err := p.dnsService.NewResourceRecordInputRdataRdataCnameRecord(target) | ||
|
||
// TODO DNS record update should handle the case where we have an A record and want a CNAME record or vice versa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this is a carried-over TODO, but should it be done now in order to fix the bug? Or is it out of scope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's out of scope for this bug since it can be fixed independently.
This TODO is for when someone changes the DNSRecord
type from CNAME to A, or visa versa, that it won't update the dns record type. That's not a big deal for OpenShift actually, because IBMCloud only creates CNAME records, but I can see it as a low priority "completeness" TODO.
} | ||
} | ||
if result == nil || result.Result == nil { | ||
return fmt.Errorf("createOrUpdateDNSRecord: ListAllDnsRecords returned nil as result") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens for the very first record created? We can't find any to update, but we need to go on to create it instead of returning an error, don't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that's a good question. I found that result
and result.Result
are both initialized when creating the DNS record for the first time. The results gets filtered for the record.Spec.DNSName
, so whether it's the very first DNS record in the hosted zone isn't a factor.
I agree it's a bit odd to check for result.Result
, but the original author wrote this logic, I haven't update it as works as intended so far, and it's possible there is a subtle reason why it's here.
createDnsRecordInputOutput: dnsclient.CreateDnsRecordInputOutput{ | ||
InputId: "testCreate", | ||
OutputError: nil, | ||
OutputResponse: &core.DetailedResponse{StatusCode: http.StatusOK}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to find a way to check that the list was updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above, there is no state to check.
OutputError: errors.New("error in CreateDnsRecord"), | ||
OutputResponse: &core.DetailedResponse{StatusCode: http.StatusRequestTimeout}, | ||
}, | ||
expectErrorContains: "error in CreateDnsRecord", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really seem to be testing any error input if we're just matching the supplied error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I view OutputError
as mocking a "fake" Upstream DNS Server as if it was providing an error. This is confirming that we are getting the error back that we told the "fake" Upstream DNS Server to generate, just like what would happen if there was a real DNS Server error.
We have other test cases that test error messages produced by our own provider code, but this error checking is still valuable because it proves the functions are returning the error we provided when the fake response is StatusRequestTimeout
.
func (fdc FakeDnsClient) ListResourceRecords(listResourceRecordsOptions *dnssvcsv1.ListResourceRecordsOptions) (result *dnssvcsv1.ListResourceRecords, response *core.DetailedResponse, err error) { | ||
fakeListDnsrecordsResp := &dnssvcsv1.ListResourceRecords{} | ||
recordType := string(iov1.ARecordType) | ||
rData := map[string]interface{}{"ip": fdc.ListAllDnsRecordsInputOutput.RecordTarget} | ||
|
||
fakeListDnsrecordsResp.ResourceRecords = append(fakeListDnsrecordsResp.ResourceRecords, dnssvcsv1.ResourceRecord{ID: &fdc.ListAllDnsRecordsInputOutput.RecordName, Name: &fdc.ListAllDnsRecordsInputOutput.RecordName, Type: &recordType, Rdata: rData}) | ||
|
||
resp := &core.DetailedResponse{ | ||
StatusCode: fdc.ListAllDnsRecordsInputOutput.OutputStatusCode, | ||
Headers: map[string][]string{}, | ||
Result: result, | ||
RawResult: []byte{}, | ||
} | ||
|
||
return fakeListDnsrecordsResp, resp, fdc.ListAllDnsRecordsInputOutput.OutputError | ||
return fdc.ListAllDnsRecordsInputOutput.OutputResult, fdc.ListAllDnsRecordsInputOutput.OutputResponse, fdc.ListAllDnsRecordsInputOutput.OutputError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
listResourceRecordsOptions
is unused. Also, can you add a comment about what is actually being returned here? We haven't changed anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, listResourceRecordsOptions
was always unused. The options would only really be helpful if we were able to test the IBM Cloud DNS Provider, as it's an input struct for their API calls.
We build a ListAllDnsRecordsInputOutput
object as an input to each of the unit test cases, and it has the OutputResult
, OutputResponse
, and OutputError
. In this function, we just return them so we can continue testing our functions (delete
and createOrUpdateDNSRecord
)
I'll add a comment.
"k8s.io/utils/pointer" | ||
|
||
"github.com/IBM/go-sdk-core/v5/core" | ||
"github.com/IBM/networking-go-sdk/dnssvcsv1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment for imports not in order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll review this further after learning more about my questions in cis_provider_test.go
Both IBM public and private cloud unit tests were missing unit test coverage. This update extends test coverage for the Delete and CreateOrUpdateRecord functions. This commit provides an important point of reference for future commits that may preturb the existing functionality. Both Test_createOrUpdateDNSRecord functions previously only tested the update logic. In order to test the create logic, `CreateDNSRecord` and `CreateResourceRecord` needed to be implemented in the public and private `fake_client.go` respectively. The new test cases required the ability to control the response and results of `ListAllDnsRecords`, which were previously hardcoded in both public and private IBM cloud unit tests. Both public and private unit tests were updated to use the new OutputResults field specified in the `ListAllDnsRecordsInputOutput` struct, allowing the new test cases to specify no result (indicating no existing DNS record) so we can trigger the create logic. Lastly, various test cases were added to cover untested scenarios, such as testing CNAME Records, mismatching targets, missing record types or IDs, handling nil results, etc.
The IBM Public Cloud DNS provider (`cis_provider.go`) had a bug in `createOrUpdateDNSRecord` where it checked for the existence of a DNS record by filtering both DNS name and target. If the target was updated (e.g., due to a load balancer recreation), the logic would not match the existing DNS record. As a result, the function would attempt to create a new record, but fail because a record with that name already existed, as multiple DNS records with the same name are not allowed in IBM Cloud DNS providers. The fix is to remove the filtering by target and rely solely on filtering by name, as the name is the only attribute that needs to be unique. Additionally, the IBM DNS logic doesn't work for multiple targets and this creates unexpected and problematic results. The logic has been refactored to only create using the first target and it warns the user when multiple targets are set. This change is low risk since the Ingress Operator will never create a DNSRecord with multiple targets in `desiredDNSRecord`. Modify TestScopeChange to check for `Available=True` after deleting the service to add test coverage for this issue.
d93c10b
to
e486a51
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@candita thanks for the detailed review. Should be ready for your review again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func (fdc FakeDnsClient) ListResourceRecords(listResourceRecordsOptions *dnssvcsv1.ListResourceRecordsOptions) (result *dnssvcsv1.ListResourceRecords, response *core.DetailedResponse, err error) { | ||
fakeListDnsrecordsResp := &dnssvcsv1.ListResourceRecords{} | ||
recordType := string(iov1.ARecordType) | ||
rData := map[string]interface{}{"ip": fdc.ListAllDnsRecordsInputOutput.RecordTarget} | ||
|
||
fakeListDnsrecordsResp.ResourceRecords = append(fakeListDnsrecordsResp.ResourceRecords, dnssvcsv1.ResourceRecord{ID: &fdc.ListAllDnsRecordsInputOutput.RecordName, Name: &fdc.ListAllDnsRecordsInputOutput.RecordName, Type: &recordType, Rdata: rData}) | ||
|
||
resp := &core.DetailedResponse{ | ||
StatusCode: fdc.ListAllDnsRecordsInputOutput.OutputStatusCode, | ||
Headers: map[string][]string{}, | ||
Result: result, | ||
RawResult: []byte{}, | ||
} | ||
|
||
return fakeListDnsrecordsResp, resp, fdc.ListAllDnsRecordsInputOutput.OutputError | ||
return fdc.ListAllDnsRecordsInputOutput.OutputResult, fdc.ListAllDnsRecordsInputOutput.OutputResponse, fdc.ListAllDnsRecordsInputOutput.OutputError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, listResourceRecordsOptions
was always unused. The options would only really be helpful if we were able to test the IBM Cloud DNS Provider, as it's an input struct for their API calls.
We build a ListAllDnsRecordsInputOutput
object as an input to each of the unit test cases, and it has the OutputResult
, OutputResponse
, and OutputError
. In this function, we just return them so we can continue testing our functions (delete
and createOrUpdateDNSRecord
)
I'll add a comment.
func (FakeDnsClient) CreateResourceRecord(createResourceRecordOptions *dnssvcsv1.CreateResourceRecordOptions) (result *dnssvcsv1.ResourceRecord, response *core.DetailedResponse, err error) { | ||
return nil, nil, nil | ||
func (fdc FakeDnsClient) CreateResourceRecord(createResourceRecordOptions *dnssvcsv1.CreateResourceRecordOptions) (result *dnssvcsv1.ResourceRecord, response *core.DetailedResponse, err error) { | ||
if fdc.CreateDnsRecordInputOutput.InputId != *createResourceRecordOptions.Name { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this mainly to be consistent with the existing patterns, but it ensures that the
incoming createResourceRecordOptions.Name
parameter is getting configured correctly. InputID
is equivalent to the name, it's just named generically to keep all update/create/delete structs similar.
I'll add a comment.
dnsclient "github.com/openshift/cluster-ingress-operator/pkg/dns/ibm/public/client" | ||
|
||
"k8s.io/utils/pointer" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do that, but what about https://github.com/go-imports-organizer/goio/blob/main/examples/openshift_kubernetes.yaml? Do we have an official style guide of how we are supposed to order go imports?
for _, resourceRecord := range listResult.ResourceRecords { | ||
if *resourceRecord.Name == dnsName { | ||
if resourceRecord.ID == nil { | ||
return fmt.Errorf("createOrUpdateDNSRecord: record id is nil") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added because if the resourceRecord.ID
is nil, you'll get a nil pointer dereference later when going to delete it, so no value in continuing. I added it because I saw cis_provider.go had it, but this provider didn't have it:
return fmt.Errorf("delete: record id is nil") |
case iov1.CNAMERecordType: | ||
inputRData, err := p.dnsService.NewResourceRecordInputRdataRdataCnameRecord(target) | ||
|
||
// TODO DNS record update should handle the case where we have an A record and want a CNAME record or vice versa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's out of scope for this bug since it can be fixed independently.
This TODO is for when someone changes the DNSRecord
type from CNAME to A, or visa versa, that it won't update the dns record type. That's not a big deal for OpenShift actually, because IBMCloud only creates CNAME records, but I can see it as a low priority "completeness" TODO.
} | ||
} | ||
if result == nil || result.Result == nil { | ||
return fmt.Errorf("createOrUpdateDNSRecord: ListAllDnsRecords returned nil as result") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that's a good question. I found that result
and result.Result
are both initialized when creating the DNS record for the first time. The results gets filtered for the record.Spec.DNSName
, so whether it's the very first DNS record in the hosted zone isn't a factor.
I agree it's a bit odd to check for result.Result
, but the original author wrote this logic, I haven't update it as works as intended so far, and it's possible there is a subtle reason why it's here.
@gcs278: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
The IBM Public Cloud DNS provider (
cis_provider.go
) had a bug increateOrUpdateDNSRecord
where it checked for the existence of a DNS record by filtering both DNS name and target. If the target was updated (e.g., due to a load balancer recreation), the logic would not match the existing DNS record. As a result, the function would attempt to create a new record, but fail because a record with that name already existed, as multiple DNS records with the same name are not allowed.The fix is to remove the filtering by target and rely solely on filtering by name, as the name is the only attribute that needs to be unique.
Additionally, the IBM DNS logic doesn't work for multiple targets and this creates unexpected and problematic results. The logic has been refactored to only create and delete using the first target. It warns the user when multiple targets are set.
This PR also includes some unit test fix up and missing unit test coverage for the IBM CIS Provider.
This resolves the same DNS issues for public PowerVS cloud as it uses the same logic.